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This paper describes the design and implementation of a special- 
purpose distributed data management system. The design and imple- 
mentation were parts of a study to evaluate the database management 
needs of software-defined logical subnetworks. The paper describes 
the authorization model used to define logical subnetworks and the 
related subnetwork management transactions. The definition of the 
subnetworks and their service capabilities are stored in a distributed 
database. The distributed database architecture and the imple- 
mented software architecture are described. The requirement to de- 
sign and implement within a specific time frame has kept the design 
simple, but the nature of the application dictated that we consider 
many aspects of the more general distributed data- management 
problem. The database management issues that are addressed in this 
paper, in the context of transaction processing, include multicopy 
updates, concurrency control, and crash recovery. A version of the 
primary node concept for multicopy updates was adopted. Data 
inconsistencies, created by premature termination of transaction 
processing (e.g., system crash), are detected and removed by the 
software. 

I. INTRODUCTION 

The advantages of distributed data processing in general, and dis- 
tributed data management in particular, have been presented in many 
publications. In spite of wide interest in its potential cost benefits, 
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distributed data management has met little acceptance in the field 
because of its potential impact on the way organizations are accus- 
tomed to managing their data-processing facilities. 1 ' 2 Issues such as 
who is responsible for purchasing of equipment, or who is responsible 
for availability of services have to be revisited in the distributed 
processing environment. This paper presents the design of a distributed 
database management system. The implementation was part of a 
special study investigating the feasibility of software-defined logical 
subnetworks. However, the design is general in that many issues of 
distributed database management are addressed and solved. 

The database contains the definition of logical subnetworks, their 
users, and the service features to which the users subscribe. In soft- 
ware-defined subnetworks, customers can be provided direct control 
over their own subnetwork. This aspect of the service is referred to as 
customer subnetwork management. The customer subnetwork capa- 
bilities deny customers control over other customers' subnetworks, 
and protect the network from customer-initiated activities. The data- 
base architecture selected, in the context of a nation-wide service, can 
be applied to a variety of communication services. 

Section II defines the requirements placed on the database manage- 
ment system by the application. Section III describes the database 
architecture selected in response to the reliability and performance 
requirements of the service. Section IV describes the software archi- 
tecture developed to maintain the distributed databases. Section V 
presents solutions to major database issues. 

II. APPLICATION 

The specific application under study was in the context of a com- 
munication service — a service assumed to be operational 24 hours a 
day, 7 days per week. 

2.1 Definitions 

2.1.1 The network 

The service is handled by a collection of physical nodes. Each 
physical node supports one or more logical nodes, called service areas 
(sas). A service area cannot span physical nodes. A service area 
provides service through a set of network addresses. An sa is identified 
by a six-digit number. A network address is identified by a ten-digit 
number that includes the six digits of the associated sa. For example, 
201-777 is a service area name, and 201-777-4444 is a network address 
associated with it. Figure 1 shows the network as a collection of 
physical nodes, each supporting one or more service areas. 

2.1.2 Users 

A user is an entity that can be provided service. Users are grouped 
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Fig. 1 — Nodes and service areas in the network. 

according to their service capabilities (e.g., set up calls, terminate calls, 
disable service to another user, etc.). At any one time, a user may only 
be a member of a single group. All the users of a given group share the 
same set of capabilities. A customer account is a collection of user 
groups. A user is identified by the combination of a group name and 
a home address. The home address of the user is a ten-digit number 
and it identifies the service area that is normally used by him/her to 
access service. But, a user may access (if authorized) network services 
through network addresses associated with any other service area. The 
customer account name is part of the group name. For example, BELL, 
BELL.RESEARCH, BELL.RESEARCH:201 -582-9999 are account, 
group, and user names, respectively. An instance of a user requesting 
service at a particular address is referred to as an active user. Users of 
an account are provided service by their home sas, which were selected 
by the customer. Therefore, an account may span several sas according 
to the distributed nature of the customer's business. 

Each user is assigned a data profile. Actually, a user's data profile is 
constructed from his/her account, group, and home address profiles. 
A user profile contains a list of the service features subscribed to by 
the user and some status information. These may include features to 
transfer information between users (communication features such as 
"call setup" and "send message") and/or features to control other 
users' service capabilities (subnetwork management features such as 
"disable and account," "a group," or another user). 

2.2 The authorization model 
A shared network must preserve the rights of its users. The author- 
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ization model defines the format of the authorization policy specified 
by the customer, and the enforcement mechanism used to control user 
access to the service. 

2.2. 1 The authorization policy 

The collection of users, network addresses, and groups associated 
with an account is referred to as a logical subnetwork. The authori- 
zation policy of an account is the database representation of its logical 
subnetwork. Logical subnetworks can be created or removed only 
through service provisioning, i.e., changes in the database. The capa- 
bility of a user to modify the authorization policy of an account is 
referred to as subnetwork management. The operations support users 
are also organized in one or more accounts. These accounts are 
identified through reserved names and associated users may be as- 
signed to customers. Subnetwork management is used by operations 
support users to create, modify, and remove logical subnetworks. 
Customers are provided access to some limited customer subnetwork 
management capability. Users who can exercise customer network 
management capabilities are referred to as administrators. Adminis- 
trators are restricted in their scope of control to their own customer 
subnetworks. For example, an administrator in one account cannot 
disable service to a user from another account. An administrator can 
modify the authorization data associated with existing users, but he/ 
she cannot create or remove either a user or a network address. 

We recall that an account consists of one or more groups. The 
implementation supports account and group administrators. An ac- 
count administrator can exercise control over his/her whole account. 
For example, the account administrator can disable service to his/her 
whole account, to a group, or to a specific user. A group administrator 
is assigned control over one or more groups within his/her own 
account, but a group can be assigned only to a single group adminis- 
trator. Group administrators can exercise subnetwork management 
functions only over groups under their control. Group administrators, 
by definition, cannot be members of a group under their control. The 
subnetwork management capabilities available to customer adminis- 
trators differ only in their scope of control. For example, an account 
administrator of account Al has the same capabilities as account 
administrator of account A2. But each administrator is restricted to 
exercise his/her capabilities within his/her own account. 

In summary, the authorization policy of an account is stored in a set 
of data profiles associated with logical subnetwork entities such as 
users, network addresses, and groups. 

2.2.2 The enforcement mechanism 

The authorization policy is used by the enforcement mechanism to 
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Fig. 2 — The enforcement mechanism. 



authorize all user service requests. This applies uniformly to requests 
issued by customer or operations support users. With respect to 
customers, the enforcement mechanism is invoked to authorize both 
communication and subnetwork management service requests. Figure 
2 illustrates the enforcement mechanism and the related authorization 
policy as defined by the model. A customer service request is author- 
ized using the policy stored in the related user profile. To minimize the 
enforcement delay associated with a service request, a copy of the user 
profile is cached in main memory at the service area used by the active 
user to access the network. The following section describes the au- 
thorization database architecture used to maintain software defined 
subnetworks. 

III. DATABASE ARCHITECTURE 

3. 1 Is distributed database management necessary? 

The application described in Section II specifies a use of data 
management to support the maintenance of user profiles. The need for 
distributed database management was one of the most important 
design decisions. An evaluation of the application showed that distrib- 
uted database management is necessary for four major reasons: load 
sharing between the nodes, enforcement mechanism performance, 
communication service reliability, and node software standardization. 

User profiles must be available at the node which provides service 
to support authorization enforcement. If all user profiles were stored 
at a single node, the system could not meet its performance and 
availability objectives. The node that maintains all user profiles may 
become a bottleneck whenever large numbers of users wish to initiate 
service concurrently. It was also concluded that it would be disadvan- 
tageous to develop and maintain two different node software versions, 
one for the node maintaining the profiles, and another for the nodes 
providing communication services. Therefore, it was decided to dis- 
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tribute the database management load associated with user profiles 
across all service areas (all nodes). 

3.2 Physical distribution of data 

User profile data is distributed at three different levels to minimize 
enforcement delay associated with communication service requests, to 
improve availability of the service, and to maintain consistency of 
replicated data elements. At any point in time, a user's profile may 
exist at: the account's sa (first level), the user's home sa (second level), 
and the access sas that support all related active users (third level). 
Data at the first level is used to coordinate the update of replicated 
data elements that have been stored at all three levels. Data at the 
second level is used to enhance the availability of service. As long as 
the user's home sa is operational, a user may be guaranteed service by 
this site. Data at the third level is stored in volatile memory and is 
used mainly to decrease the delay associated with enforcement. 

3.2.1 The first level 

All authorization data associated with an account (e.g., group pro- 
files, user profiles, address profiles, etc.) are stored in a customer 
authorization database (cadb) at a specific service area. This service 
area is referred to as the account's sa (or the first level). The cadb of 
a service area may contain the authorization data of one or more 
accounts. The authorization model does not allow any logical data 
dependencies between accounts. This design decision simplifies signif- 
icantly the maintenance of data integrity of logically related profiles 
within an account. For example, the system rejects a "create user" 
transaction if the related group and home address profiles do not exist. 
The account's sa is selected by operations support personnel according 
to load-balancing considerations, independently of the service areas 
that will provide service to users. 

The primary node concept for multicopy updates was chosen to 
support profile updates. 3-6 All changes in authorization information 
will occur first in the cadb, and will be propagated to the other levels 
if necessary. 

3.2.2 The second level 

Whenever a user is created, a user profile is downloaded from the 
cadb to the users' home sa (or the second level), and stored in 
nonvolatile memory. All secondary copies at a service area are stored 
in the secondary copy database (scdb). A scdb includes profiles of all 
users homing on this service area, independently of their account 
membership. Users can be provided service as long as their home sa 
is operational, even if their account's SA is not. 
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Fig. 3 — Primary copy concept — multiple copy updates. 

Secondary profile copies can be updated only through the account's 
cadb. Therefore, subnetwork management requires that both the 
user's home sa and the account's sa be operational. Figure 3 shows an 
instance of the primary copy concept. Profile Fa has its primary copy 
installed at sai, sa2, and SA3. Similarly for profile Fb, the primary copy 
is installed at SA2 (Fb'), and secondary copies exist at SA3 and SA4. A 
secondary copy of profile Fa is installed at sai because the primary 
copy is not used to support enforcement. This decision simplifies the 
development of the software responsible for migrating cadbs and 
scdbs, independently from one node to another. For example, the 
cadbs of several accounts can be homed to a new service area at a 
different node without any changes in the content of the scdbs at the 
old node. Network growth and/or load balancing considerations will 
require the migration of databases between nodes. 

3.2.3 The third level 

A user may be provided service through a network address only 
after a log-on procedure verifies his/her identitiy. If log-on is successful, 
a session associated with an active user is established. During a session, 
an active user may submit one or more service requests. For each 
active user, a copy of his/her profile is cached in main memory at the 
access service area (or the third level). This may or may not be the 
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Fig. 4 — User profiles. 

home service area of the user. The cached profile copy is created to 
further minimize enforcement delay associated with communication 
service requests. The active user profile is kept in volatile memory for 
the duration of the session. The system can provide service concur- 
rently to one or more active instances of a user. The user's home sa 
keeps track of all active instances of a user. Figure 4 shows account A1 
homing on service area SA1. Two users of this account, U1 and U2, 
home on service area SA6 and SA3, respectively. An active instance of 
user U2 is created when he/she requests service through network 
address NA3, associated with service area SA5. Other active users, 
U1 :NA1 and U2:NA2, are provided access through service areas SA6 
and SA3, respectively. 

Each node maintains a routing table which identifies the sas of all 
accounts. Update transactions are routed, using this table, to the 
appropriate account's cadb. Each profile in the cadb maintains a list 
of sas, where related secondary copies have been installed. This 
information is used to propagate update transactions to the second 
level. 

Changes to a profile at the third level take effect after session 
termination of the affected active user. All new sessions are established 
using the updated version of the profile at the second level. To 
minimize interference during a session, the ao!ministrator may or may 
not request immediate activation of the change. If an immediate option 
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Fig. 5 — Functional partitioning of the software. 

is specified, sessions of all affected active users are terminated. If the 
immediate option is not specified, the update at the second level will 
affect only future sessions. 

In the rest of this paper, discussions will be restricted to issues 
associated with the first and second levels of the database architecture. 

IV. SOFTWARE ARCHITECTURE 

The UNIX* operating system was chosen as the implementation 
environment. 7 At the time of the study, a commercial database man- 
agement system controlled by the UNIX operating system was not 
available. To limit development resources, a decision was made to 
concentrate most of the effort on the distributed aspects of the prob- 
lem. The centralized database management system supports file direc- 
tories, sequential files, and single key files, built over the UNIX file 
system. The assumption was that centralized database management 
could be upgraded in the future, once a commercial package became 
available. Figure 5 shows the functional partitioning of the software. 
The design effort concentrated on key issues such as multicopy up- 
dates, concurrency control, and crash recovery. Tight dependencies 
were detected between these issues. Similar dependencies were re- 
ported in Ref. 6. Section V presents some of these dependencies and 
their impact on implementation. 

4. 1 Process structure 

The two functions associated with multicopy updates of the cadb 
and scdb are supported by two data managers. The first manager, 
referred to as the primary copy data manager (pcdm), updates profiles 
at the first level and coordinates the distribution of secondary copy 
updates to the affected scdbs. The second manager, referred to as the 
secondary copy data manager (scdm), updates profiles at the second 
level — stored in the scdb — in response to pcdm requests and supports 
enforcement at the user's home service area. The two data managers 
are implemented as separate processes because of size limitation 
imposed by the operating system. Figure 6 shows the uniform process 
structure selected for all service areas. 
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Fig. 6 — Data management process structure. 

The design of both data managers follows a layered approach with 
well-defined interfaces. This facilitated parallel development of the 
layers, and allows easy upgrading of existing layers in the future. 
Database updates of first-level copies are supported through four 
layers. The first layer provides local data access and secondary storage 
management. The second layer (pcdm) supports centralized control of 
updates, data consistency within the primary copy's node, concurrency 
in transaction processing, deadlock prevention, and crash recovery. 
The third layer allows data distribution to be transparent to node data 
managers. The fourth layer, as a user interface, transforms user queries 
into database transactions. This layer also includes an access control 
mechanism based on the capability model. 8,9 

Enforcement requests and updates of secondary copies are sup- 
ported through two layers. The first layer provides local data access 
(the same one as for the first-level copies). The second (scdm) updates 
secondary copies and supports enforcement. Figure 7 shows the data- 
base management system's layered architecture. 

V. TRANSACTION PROCESSING 

The design process was implementation driven. In all cases in which 
a simple solution was available, it was adopted. Initially, we decided to 
review issues one at a time. However, we soon discovered that some 
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Fig. 7 — System software architecture. 

aspects of distributed data management are interrelated and cannot 
be examined separately. For example, the update algorithm had to be 
revised numerous times while crash recovery was investigated. As long 
as crash recovery was not considered, all update algorithms were 
satisfactory. The following sections review the transaction types sup- 
ported, the update algorithm, and the crash-recovery mechanism. 

5. 1 Input transaction types 

The pcdm accepts three types of transactions, (i) Customer subnet- 
work transactions are issued by an administrator to exercise control 
over his/her logical subnetwork. These include access requests to 
either verify or modify the status of the logical subnetwork, (ii) 
Process control transactions are issued by maintenance personnel to 
monitor and control pcdm functionality. For example, these transac- 
tions can be used to disable the processing of a particular type of 
customer subnetwork transaction for all customers because of a soft- 
ware problem. {Hi) Response transactions are issued by sdcms in 
response to pcDM-initiated update transactions. 

Process control and response transactions have a uniform high 
priority, and are processed before all new subnetwork management 
transactions. Customer subnetwork transactions are processed by the 
pcdm at one of three priorities. Administrators may specify the priority 



DATABASE MANAGEMENT 2469 



1. LOCK MASTER COPY RECORD(S) 

2. UPDATE MASTER COPY RECORD(S) 

3. PROPAGATE CHANGES TO THE SECOND LEVEL(S) 
WAIT FOR ACKNOWLEDGEMENT(S) 

4. UNLOCK MASTER COPY RECORD(S) 

5. ACKNOWLEDGE TRANSACTION 



Fig. 8 — Update algorithm. 

of each transaction submitted. If not specified, a default priority is 
assigned by the system uniformly for all customers. 

The scdm accepts three types of transactions, (i) Update transac- 
tions are issued by pcdms to create, remove, or modify profiles in the 
scdb. (») Process control transactions are issued by maintenance 
personnel to monitor and control scdm functionally. ( Hi) Enforcement 
transactions are issued by other processes to verify the capabilities of 
a user in the context of a service request. A user's request to initiate a 
session, or to make a call to another user are examples of requests that 
have to be authorized by the scdm. 

Process control and enforcement transactions have a uniform high 
priority, and are processed before all new update transactions. All 
update transactions are processed at a single-priority level because 
they are issued by the pcdm. 

5.2 The update algorithm 

The update algorithm handles transactions to add, modify and 
remove the profile. Each update transaction is routed to the pcdm at 
the account's home service area. The pcdm updates the first-level 
copy(ies) in the cadb and coordinates the update of all related second- 
ary copies if necessary. Once all secondary copies have been updated, 
the pcdm returns an acknowledgment to the user interface process 
which originated the transaction. Figure 8 describes the update algo- 
rithm. The algorithm is similar to phase one of the two-phase commit 
protocol. 2 But, once the transaction is received at the scdm, the 
response to the pcdm represents the fact that the change has been 
committed. If one or more scdms do not acknowledge completion, the 
pcdm terminates the transaction and notifies node maintenance of the 
potential existence of a database inconsistency. Node maintenance is 
provided with software tools to restore database consistency. The role 
of these tools is further discussed as part of the recovery strategy. 

5.2.1 Locking 

The locking mechanism supported by the system has four major 
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functions. First, it is used for concurrency control to enforce the 
serializability of multisource updates. 1011 For example, different active 
users may submit authorized update transactions related to the same 
profile. The locking information is maintained in a system lock table 
and not with the data entities being updated. 

Second, it is used to recover update transactions that have been 
abnormally terminated as a result of a system failure. The lock table 
is maintained in nonvolatile memory. If the lock table survived the 
crash, it is used by the recovery process to identify potential database 
inconsistencies. The lock table contains sufficient information to re- 
store database consistency using roll forward or backward techniques. 12 

Third, it is used to prevent data-resource-related deadlocks. 13 Each 
update transaction locks a priori all data elements that are to be 
modified at the cadb, in the first step of the update algorithm. Most 
updates require the locking of two or three records in the cadb. The 
system supports locking granularity at the database, the file, and the 
record level. Using the primary copy concept, locking at the first level 
also implies locking at the other levels. Therefore, it is not necessary 
to acquire and release data locks across different nodes. If a transaction 
cannot acquire all the locks it needs, the system suspends its process- 
ing, and an attempt is made by the system to retry it later. 

Fourth, the locking mechanism is used to enforce the cadb order of 
updates at the respective scdbs. The communication network used to 
propagate changes from pdcms to scdms does not ensure the delivery 
of transactions in the order they were sent out. 14 To prevent "race" 
conditions, the update algorithm releases the locks only after all 
secondary copies have been updated. 

To prevent locking of data for long periods of time, the pcdm times- 
out each update transaction propagated to an scdm. If acknowledg- 
ment is not received within a predefined time interval, the transaction 
is aborted and node maintenance is notified. Retransmitting update 
transactions to nonresponding scdms was not found useful. The loss of 
the initial update transaction is the only case where a retransmission 
is useful, but this is a rare occurrence in existing communication 
networks. Whenever the scdm is overloaded or out of service, retrans- 
missions may actually worsen the situation. 

5.3 Crash recovery 

The authorization policy is stored at three levels. Failures which 
disable access to these databases may result in communication and/or 
subnetwork management service interruption. This makes the recov- 
ery mechanism a critical element of the database management system. 
The main objectives of the recovery mechanism are to restore service 
by a service area as soon as possible, and to minimize the permanent 
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loss of data. It is also important to eliminate recovery dependencies 
between nodes of the network. A node should be able to recover on 
command, by itself, and restore service. The design assumes that 
database inconsistencies will occur, but service does not depend on 
complete consistency. The software is designed to detect database 
inconsistencies during normal transaction processing. Once an incon- 
sistency is detected, node maintenance is notified. Node maintenance 
is provided a set of synchronization transactions to restore consistency 
within the first level, or between the first and second levels. These 
transactions are similar to regular update transactions, and differ only 
in the way they handle error conditions. Only after this manual method 
proves itself in the field will we consider automatic removal of database 
inconsistencies. 

Appendix A provides a summary of the components used in recovery. 
Appendix B describes the pcdm and scdm recovery algorithms used to 
restore service. 

VI. SUMMARY 

The development of a distributed database management system and 
of a subnetwork management application has been completed. The 
study demonstrated the feasibility of software defined subnetworks 
associated with a single nationwide network. We presented in this 
paper distributed database techniques and the feasibility of their 
implementation. These integrated database techniques are currently 
applied in the design of new communication services. 

One of the issues not addressed in this paper is the development of 
tools for testing and debugging in a distributed processing system. The 
effort required to develop these tools was equal in scope to the 
development of both the pcdm and scdm. The results in this area will 
be reported separately. 
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APPENDIX A 

Components of the Recovery Mechanism 

Each node is equipped with the following components to be used in 
the recovery process: 

( i ) Duplex hardware is provided to protect service against single 
hardware failures. 

( ii ) The operating system maintains two copies of each data ele- 
ment on separate disk drives. 

(Hi) The node creates periodically a local, off-line, backup copy of 
the database for checkpointing. 12 

{iv) During transaction processing, the system maintains on disk 
the lock table and a completed update transaction log. The completed 
update transaction log is maintained at the physical page level and is 
used only by the recovery process. Physical logging was selected to 
improve performance during crash recovery when the database is 
rolled forward. The log maintains sufficient information to roll the 
database both forward and backward. 12 The log is kept until the next 
checkpoint is established. Audit trails for subnetwork management 
transactions are maintained separately at the logical level by the 
application programs. 

(v) Node maintenance has a set of synchronization transactions 
for restoring database consistency at all levels. 
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APPENDIX B 

The Recovery Algorithm 

Using the recovery components specified in Appendix A, service can 
be restored in the following way: 
(i) At the pcdm 

a. Check file consistency on disk. 

b. If both disk copies are lost, install backup copy, and inter- 
nally run the completed update transaction log. Interactions 
with other processes, as part of normal transaction process- 
ing, are not necessary. 

c. Use the lock table to back out prematurely terminated 
transactions (synchronize within the first-level database and 
between the first- level and second-level databases). 

d. Renew subnetwork management service. 

e. Detect inconsistencies during normal transaction processing 
and notify node maintenance. 

(ii) At the scdm 

a. Check file consistency on disk. 

b. If both copies of the database on disk are lost, install backup, 
and internally run the completed update transaction log. 
Interactions with other processes, as part of normal trans- 
action processing, are not necessary. 

c. Renew communication service. 

d. Inconsistencies detected during normal transaction process- 
ing are forwarded to node maintenance. 
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